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Abstract 

Mutations in the RNase Illb domain of DICER1 are known 
to disrupt processing of 5p-strand pre-miRNAs and these muta- 
tions have previously been associated with cancer. Using data 
from the Cancer Genome Atlas project, we show that these muta- 
tions are recurrent across four cancer types and that a previously 
uncharacterized recurrent mutation in the adjacent RNase Ilia 
domain also disrupts 5p-strand miRNA processing. Analysis of 
the downstream effects of the resulting imbalance 5p/3p shows 
a statistically significant effect on the expression of mRNAs tar- 
geted by major conserved miRNA families. In summary, these 
mutations in DICER1 lead to an imbalance in miRNA strands, 
which has an effect on mRNA transcript levels that appear to 
contribute to the oncogenesis. 
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Brief Communication 

MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate 
expression of their transcript targets [T] DICER1 is a key enzyme that is 
responsible for cutting the 5p and 3p strands of the pre-miRNA in the early 
stages of the miRNA biogenesis. Processing of the 5p and 3p strands, which 
is carried out by the RNase III domains of DICER1, is necessary for loading 
the functional miRNA strand into the RISC complex. Previous studies have 
identified recurrent mutations in the RNase Illb domain in different cancer 
types [2[2H[g|6j[7im[9]. These mutations (at residiues E1813, D1810, 
D1709, E1705 and R1703) were shown to be in the active site of the enzyme 
and were proven to disrupt the processing of the 5p stand of the miRNA [TU] . 
Others have shown that hotspot mutations in the RNase Illb domain cause 
depletion of 5p strands relative to their corresponding 3p strands, leading 
to an asymmetry in the abundance of the two [7] . 

Although the asymmetry in the miRNA processing due to hotspot muta- 
tions has been characterized using model organisms; the effect of this miRNA 
depletion on the mRNA levels have not been studied extensively in the con- 
text of the human tumors. It is, for example, unknown whether it is the 
5p-strand depletion or increased 3p-strand accessibility that promotes the 
cancer. In either of the cases, it is also unknown whether there is any partic- 
ular miRNA or miRNA family of which depletion or over-expression drives 
this phenotype. In this study, using human tumor data from the Cancer 
Genome Atlas (TCGA) project, we wanted to better characterize the effects 
of DICER1 mutations on miRNA and mRNA profiles of the patients. 

We first asked whether we could observe the asymmetry in the miRNA 
processing using the miRNA-Seq data. For this, we looked whether any of 
the previously identified hotspot mutations were present in the TCGA data 
set (14 cancer types, 5535 sequenced samples). We found that 15 out of 
123 DICER1 mutants carried a mutation in the RNase Illb domain of the 
protein at a previously identified hotspot (Figure [iji) . After filtering out 
cases that were hyper-mutated and samples that did not have miRNA-Seq 
data available, we were left with 8 DICER1 hotspot mutants. We then 
compared the miRNA levels in these hotspot mutants to the miRNA levels 
in 3171 DICER1 wildtype tumors across multiple cancers. Confirming the 
results of the previous studies, we saw 5p strand miRNAs were relatively 
down-regulated in mutants and the changes in the expression of 5p strands 
were significantly different than the 3p strands (Wilcoxon rank sum test; 
p < 1CT 29 ; Figure [lb-c). 

Having observed a phenotype characterized by relative 5p strand deple- 
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tion in hotspot RNase Illb mutants, we asked whether any of the other 
DICER1 mutants had a similar phenotype. To investigate this, we first es- 
timated the abundance of 5p strands relative to 3p strands for each patient: 
ml 3 = log 2 (m5/m 3 ), where m l x is the median expression of the x-strand 
miRNAs in patient i. As expected, the majority of the hotspot mutants 
had exceptionally low 5p-strand abundance compared to DICER 1 wildtypes 
(Figure [J). 

In addition to the known hotspots mutants, we identified three more 
DICER1 mutant cases that had relatively low 5p abundance (m\^ < 0). 
One of these three DICER1 mutants had a hotspot mutation in its RNase 
Illb domain, but was excluded from the initial analysis because it was a in 
hyper-mutated sample (Table |S2j ) . Surprisingly, the other two cases with 
low 5p abundance had an S1344L mutation in the RNase Ilia domain that 
is responsible for processing the 3p strand of the miRNA. 

As the observation of recurrent mutations in cancer samples is consistent 
with a selective functional impact of the mutation, the question arises as to 
the effect of the S1344L mutations on the catalytic function of the RNase do- 
mains. Inspection of the 3D structure (or model) of the individual domain 
reveals that residue S1344L (in domain Ilia) and its homologous residue 
T1733 (in domain Illb) are far from the active site residues (19.60±2.62A 
distance) in their respective domains (Figure [lj). However, evolutionary 
couplings [T2] between S1344L/T1733 and the active site residues, as de- 
duced from co-evolution patterns in the multiple sequence alignment of 
RNase Ill-like domains, are fairly strong. The contradiction is resolved 
by inspection of the model of the RNase Ilia - Illb heterodimer (as inferred 
from the crystal structure of the RNase Illb homodimer) [TU]. In the het- 
erodimer, S1344L in domain Ilia is close (11.72±1.98A distance) to active 
site of domain Illb (residues E1813, D1810, D1709, E1705 and R1703) and 
T1733 in domain Illb is close to the active site residues of domain Ilia. 
These residue arrangements and functional couplings are beautifully consis- 
tent with the observation that mutations in S1344L in domain Ilia affect 
5p processing, as observed in our analysis of the effect of these mutations 
on the balance of 3p/5p miRNA expression profiles in cancer samples. This 
is consistent with the earlier observations that mutations in the active site 
residues of domain Ilia affect 3p processing, while mutations in the active 
site residues of domain Illb affect 5p processing. The subtly of the difference 
between the earlier and current observation lies in the residue interactions 
across the heterodimer interface [13 and in fact the earlier observation of 
3p / 5p asymmetry are confirmed here by completely independent observation 
in human cancer samples. 
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Other studies have shown that DICER1 hotspot mutations are biallelic 
in cancer, where a disabling mutation acts as the second hit to the enzyme 
[HI El IE] Based on this observation, the relative 5p depletion phenotype of 
RNase III mutants in our analysis suggested that these patients also had a 
second event disabling the other DICER1 allele. To address this question, 
we re-analyzed the sequencing data available for DICER1 mutant cases, 
this time using a different pipeline that can better identify insertions or 
deletions. In a majority of the DICER1 RNase III hotspot mutant samples, 
we were able to identify a secondary disabling genomic event affecting the 
other DICER1 allele (Table S3). Furthermore, we found that these biallelic 
mutated cases had lower 5p abundance than the other DICER1 mutants in 
our earlier analysis. 

Having identified possibly functional mutations in DICER1 and their 
effect on the miRNA profiles, we tested whether these mutations lead to 
functional changes in the mRNA profiles. Others have previously char- 
acterized DICER1 hotspot mutations using mouse-derived cell lines as in 
vitro models [7J EJ [TT] These studies have shown that the mRNA profiles of 
cell lines with different DICER1 RNase Illb hotspot mutations had differ- 
ent mRNA signatures compared to the DICER1 -wild type cell lines. They 
further found an association between the down-regulated miRNAs and their 
differentially-expressed target transcripts, which suggests a differential regu- 
lation of the mRNA levels due to asymmetric miRNA processing in DICER1 
hotspot mutants. 

Although there is in vitro evidence that the asymmetry in the miRNA 
processing lead to significant changes in the mRNA profiles; there are no 
previous reports that describe the differential mRNA expression in accor- 
dance with the miRNA expression data from human tumors. To this end, 
we identified 12 cases across four cancer types that both had RNA-Seq data 
available and carried a hotspot RNase III mutation either in the Ilia or 
Illb domains of the DICER1 protein. We then wanted to check whether 
we could identify a common mRNA expression signature for these DICER1 
RNase III hotspot mutants in comparison to 1212 DICER1 wildtype cases in 
those four cancer studies. For this, we decided to restrict our analysis to the 
Uterine Corpus Endometrial Carcinoma (UCEC) study where the RNA-Seq 
data set contained 8 DICER1 RNase III mutants and 222 DICER1 wild- 
types. We found 10 genes to be significantly up- regulated and none to be 
down-regulated in the hotspot mutated cases when compared to wildtypes 
(p < 0.05 after Bonferroni correction; Table S4). Notably, we found higher 
expression of HMGA2, a well-known oncogene and target of let-7 miRNA 
family, in mutants |14l IT51 [1] . 
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Following up on this, we asked whether the up-regulated genes in mu- 
tants were targets of particular miRNA families. To answer this question, 
we conducted a gene set enrichment analysis (GSEA) using well-known bi- 
ological pathways and well-conserved miRNA family target genes as our 
query gene sets [TH]. Our analysis showed strong enrichment of both let- 
7/98/4458/4500 and miR-17/1 7-5p/20ab/20b-5p/93/106ab/427/518a-3p/519d 
target genes in RNase III mutants (Table [T] FDR < %10). For both fam- 
ilies, 5p strand of the miRNA is the predominant strand and as expected, 
in RNase III mutant cases, 5p-strand miRNAs that belong to these fami- 
lies were relatively down-regulated. Results from the GSEA also suggested 
that there was relatively weaker enrichment for other miRNA families and 
NOTCH-related pathways (Table [TJ FDR < %15). A majority of the en- 
riched gene sets (5 out of 7) represented miRNA family targets, which sug- 
gests the gene expression signature associated with these RNase III hotspot 
mutants is more likely to be mediated by depleted miRNA families rather 
than a common biological pathway. In accordance with the 5p strand de- 
pletion phenotype, a majority of these miRNA families (3 out of 5) were 
5p-strand dominated. For the other two families, miR-29abcd and miR- 
101/101ab, although 3p is the pre-dominant miRNA strand, we saw that 
members of these families were down-regulated as a family in DICER 1 mu- 
tants compared to wildtype, which might be due to an indirect regulatory 
effect of 5p miRNA depletion. 

In summary, we showed that biallelic DICER1 RNase III hotspot mu- 
tations, although infrequent across cancers, lead to relative depletion of 5p 
stand of miRNAs. In addition to known hotspot mutations, we were able 
to identify a previously unknown recurrent DICER1 mutation, S1344, that 
also leads to the 5p depletion phenotype. In accordance with the miRNA 
depletion phenotype, we saw up-regulation of genes that are well-known tar- 
gets of the 5p-dominant miRNA families in mutant samples. It still remains 
unclear whether up-regulation of a particular gene, such as HMGA2, or ac- 
tivation of a particular pathway, such as NOTCH, is contributing to the 
oncogenesis as a result of the 5p miRNA depletion in these cells. 
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Figure 1: Disabling mutations in RNase III domains of DICERl 
lead to 5p miRNA depletion in cancer, a) A majority of the hotspot 
mutations in the RNase III domains of the DICERl are present in the Can- 
cer Genome Atlas project across multiple cancer types, b-c) Hotspot muta- 
tions in the RNase Illb domain cause relative down-regulation of 5p-stand 
and up-regulation of 3p strand miRNAs in mutants compared to DICERl 
wild-types, d) Hotspot mutated samples tend to have relatively lower 5p 
miRNA abundance compared to DICERl wild-type cases. Using sample- 
specific relative 5p abundances, we identified three more DICERl mutated 
cases that also show 5p-depletion phenotype (7715,3 < 0). e Two out of three 
cases, who has relatively low 5p abundance, had a S1344 mutation in the 
RNase Ilia domain that is responsible for processing the 3p strand of the 
miRNA. The mutated amino acid, S1344 in RNase Ilia domain, is homolo- 
gous to T1733 in RNase Illb domain, which in turn is evolutionary coupled 
to the hotspot mutations. This indicates that S1344, although it is in RNase 
Ilia domain, is important for proper functioning of the RNase Illb domain. 
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SI Online Methods 



The code for analyses conducted in this study and supplemental results for 
each of the analyses are available at http:/ /bit.ly/dicer5p. In this study, we 
used miRNA, RNA-Seq and sequencing data from 14 TCGA cancer studies 



(Table SI). 



Table SI: We analyzed a total of 2855 samples with miRNA and 
sequencing data across 14 cancer studies from the Cancer Genome 
Atlas. 



Abbreviation 


Cancer study name 


# of samples 


BLCA 


Bladder urothelial carcinoma 


137 


BRCA 


Breast invasive carcinoma 


190 


COADREAD 


Colorectal adenocarcinoma 


241 


GBM 


Glioblastoma multiforme 


248 


HNSC 


Head and Neck squamous cell carcinoma 


267 


KICH 


Kidney chromophobe 


64 


KIRC 


Kidney renal clear cell carcinoma 


184 


LGG 


Brain lower grade glioma 


286 


LUAD 


Lung adenocarcinoma 


180 


LUSC 


Lung squamous cell carcinoma 


51 


PRAD 


Prostate adenocarcinoma 


248 


STAD 


Stomach adenocarcinoma 


244 


THCA 


Thyroid carcinoma 


399 


UCEC 


Uterine corpus endometrial carcinoma 


116 




Total 


2855 



Sl.l Identification of DICER 1 hotspot mutations 

We first asked whether previously identified DICER 1 hotspot mutations 
at residues E1813, D1810, D1709, E1705 and R1703 ar e present in TCGA 
data sets. For this, we conducted a cross-cancer query on cBioPortal [T7] and 
found 123 out of 5535 sequenced samples to be DICER1 mutated (Figure^ 
and File all_tcga-dicerl-20 14_03_20 .maf) . Of these 123, 12 tumor samples 
had at least one hotspot DICER1 mutation in the RNase Illb domain. 



SI. 2 Analysis of the miRNA-Seq data 

We next wanted to see if hotpost mutant tumors had a distinct miRNA ex- 
pression profile compared to other samples. To address this question, we first 
obtained normalized miRNA-Seq data sets (Level 4) from the most recent 



SI 



Downloaded from http://biorxiv.org/on September 18, 2014 



TCGA analysis runs (January 15, 2014) as generated with the Firehose anal-| 



ysis pipeline, miRNA-Seq data for Glioblastoma Multiforme cancer study 
was not available from this resource, therefore, for GBM, TCGA Level 1 
microarray expression data were processed and normalized using the AgiMi- 
croRna R package and using settings further explained in a previous study 

pans]. 

We then wanted to see whether particular miRNAs were differentially 
expressed in DICER1 RNase Illb mutants compared to DICER1 wild-type 
cases. We initially excluded hotspot mutants from the analysis if they were 
either categorized as hyper- or ultra-mutated, or if the predicted effect of 



the mutation was not high as assigned by the Mutation Assesor (Table S2 ) 
|20j . To check for differential expression, we compared distribution of each 
miRNA expression in mutants versus wildtypes by using a Wilcoxon rank 
sum test. We adjusted the p- values using a Bonferroni correction for multiple 
hypothesis testing. To estimate the change in expression, we calculated 
the difference in median log2-b&sed expression values between mutant and 
wildtype samples (Figure [Tja-c) . 

Table S2: To identify the miRNA expression signature associated 
with hotspot DICER1 mutations, we excluded hyper-mutated 
cases from the initial analysis. Ultra- or hyper-mutated cases tend 
to have higher number of somatic mutations compared to other samples. To 
identify miRNA profiles associated with the hotspot DICER1 mutants in a 
restrict way, we first conducted the differential miRNA expression analysis 
only on samples with relatively low number of somatic mutations (n < 1000). 

Sample identification Reason for exclusion 

TCGA-A6-6141 Hyper-mutated sample 

TCGA-AP-AOLM Low allele frequency and ultra-mutated sample 

TCGA-BS-AOUV Low FIS and ultra-mutated sample 

TCGA-CG-5733 Low FIS and hyper-mutated sample 

TCGA-D1-A17Q Ultra-mutated sample 



To check whether the distribution of differential miRNA expression was 
different for different strands of the miRNA, we conducted pairwise com- 
parisons of the differential expression values for different strands of miRNA: 
5p, 3p and V where means no strand information was available for 
that miRNA. For this comparison, we utilized Wilcoxon rank sum test and 
adjusted the p-values using a Bonferroni correction. 
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SI. 3 Additional mutation calling for DICER1 hotspot mu- 
tants 

Having observed different levels of respective 5p strand depletion in hotspot 
DICER1 mutants, we wanted to see if patients with extreme phenotypes had 
any additional germline or somatic mutations affecting the other DICER1 
allele. We, therefore, downloaded whole-exome binary sequence alignment 
and mapping (BAM) files for normal and tumor samples corresponding to 
the hotspot DICER1 mutated cases from CGHub. We then used Haplo- 
typeCaller utility from the Genome Analysis Toolkit to do the joint variant 
calling on these BAM files [21J . To annotate the variants, we used Mutation 



Assesor and Oncotator tools 

We next used the annotated mutation file to look for new mutations that 
were not called by the TCGA pipeline (File: muts_tcga-dicerl-secondcall- 
2014_04_09.maf). In addition to the previously called hotspot mutations, 
we were able to identify other disabling DICER1 alterations in samples that 



showed relatively low 5p strand abundance (Table S3) 



SI. 4 Identification of evolutionary couplings in RNase III 
domain 

In our miRNA expression analysis, in which we estimated the relative 5p 
strand abundance for each patient, we saw that two samples that have the 
biallelic S1344L mutation had considerably low 5p abundance. Based on 
the fact that RNase III dimerization is necessary for proper DICER1 func- 
tioning, we wanted to see how S1344L could affect 5p miRNA processing 
|13j . For this we ran evolutionary couplings (ECs) analysis with default 
settings on the EVFold server (vl.ll) [12J. We provided [DICER_HUMAN 
| (UniProt : Q9UPY3) | as the input protein, residues 1423-1922 of DICER1 as 
the sequence of interest to center the RNase Illb domain and PDB:2ebl as 
the reference structure [10]. We set the e-value for jackhmmer as 10~ 10 and 
the inference method for determining the evolutionary couplings as Pseudo 
Likehood Maximization (PLM). 

The analysis showed that the most strongly constrained residues (with 
strong couplings to other residues) were 1708, 1709, 1813, 1705 and 1704. 
The contact maps were fairly structured, indicating they were of reasonable 

quality (File: EvCouplings_DICERl RNaseIIIb_with_2ebl.zip). Well-known 

active site residues with relatively high EC strength included 1709, 1813 and 
1705. We found that residues 1709, 1813 and 1705 were coupled to 1733. 
These ECs, however, were not consistent with the known structural con- 
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Table S3: Hotspot DICER1 mutations that lead to 5p depletion 
phenotype are biallelic in TCGA samples. For the majority of the 
hotspot DICER1 mutants, we were able to identify a second genomic event 
that affect the other DICER1 allele. These biallelic mutated samples were 
enriched for stronger 5p depletion phenotype (i.e. lower 7715,3) compared 
to monoallelic alterations. THCA: Thyroid carcinoma; UCEC: Uterine cor- 
pus endometrial carcinoma; GBM: Glioblastoma multiforme; COADREAD: 
Colorectal adenocarcinoma; CNA: Copy number alteration; HetLoss: Het- 
erozygous loss; N/A: Not available. 



Sample identifier 


Cancer study 


Mutation 


CNA 


1™5,3 


TCGA-EL-A3GO 


THCA 


D1810H, K376fs 




-1.43 


TCGA-D1-A15Z 


UCEC 


D1810A, L539fs 




-1.08 


TCGA-EL-A3D5 


THCA 


E1813G, L81fs 




-1.05 


TCGA-DI-AOWH 


UCEC 


D1709N, M1821I, 
K1486fs 




-1.02 


TCGA-06-2569 


GBM 


E1705Q, 
CLPSIL1053del 


Gain 


-0.93 


TCGA-A5-A0GN 


UCEC 


S1344L 


HetLoss 


-0.92 


TCGA-14-0871 


GBM 


Homozygous E1813G 




-0.83 


TCGA-A6-6652 


COADREAD 


D1810N 


HetLoss 


-0.71 


TCGA-B5-A11U 


UCEC 


S1344L, P1377fs 




-0.63 


TCGA-D1-A17Q 


UCEC 


E1705K, H341P 




-0.36 


TCGA-AP-AOLM 


UCEC 


E1705A, R490H, 
F1650C 




0.36 


TCGA-DM-A28C 


COADREAD 


E1705Q 




0.48 


TCGA-A5-A0GH 


UCEC 


E1813G, V1731fs 




N/A 


TCGA-BG-A0M6 


UCEC 


E1813A 




N/A 


TCGA-D1-A0ZP 


UCEC 


R1703C 




N/A 



straints as 1709, 1813 and 1705 were not in close proximity to 1733 in the 
3D structure (19.60±2.62A distance). 

A multi-alignment involving both RNase Ilia and Illb domains indicated 
that SI 344 in RNase Ilia domain was homologous to 1733 in RNase Illb 
domain. We then inspected the corresponding locations of these residues in 
the 3D protein structure and found that ECs from residues 1709, 1813 and 
1705 to 1733 were better explained in the RNase Illb dimer context, where 
active site residues in one domain were closer (11.72±1.98A distance) to the 
1733 (i.e. S1344) in the other domain. Based on these observations, we 
concluded that these couplings might indicate an important role for S1344, 
together with other active site residues (1709, 1813, 1705) in RNase Illb 
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domain, in 5p strand processing. 



SI. 5 Analysis of the RNA-Seq data 

We next asked whether DICER1 hotspot mutants had distinct gene expres- 
sion profiles compared to other samples. To answer this question, similar 
to miRNA data, we obtained processed and normalized RNA-Seq data sets 
(Level 4) from the most recent TCGA analysis runs (January 15, 2014) 
as generated with the Firehose analysis pipeline, We found that THCA, 
GBM, COADREAD studies had RNA-Seq data for less three hotspot mu- 
tants, hindering a statistically robust comparison. We, therefore, decided 
to restrict our analysis to only UCEC study, where there were 8 DICER1 
hotspot mutant and 222 DICER1 wildtype samples. 

We then conducted a differential gene expression analysis using the 
limma voom R package on the gene-level RSEM counts for UCEC study 
and contrasted the hotspot mutant to wildtype samples [22]. We found 
9 genes to be significantly up-regulated-and none down-regulated-in mu- 



tants (p < 0.05 after Bonferroni correction; Table S4 File: DGE-UCEC- 
muts_vs_wts-allGenes. tsv) . 

Table S4: A differential gene expression analysis comparing 
DICER1 hotspot mutants to wildtypes showed 9 significantly up- 
regulated genes in mutants. We compared the gene expression levels in 
8 DICER1 mutants to the levels in 222 DICER1 wildtypes using the limma 
voom toolkit. We used Bonferroni correction to adjust our p- values for mul- 
tiple hypothesis testing and found 9 genes to be differentially up-regulated 
in mutants (p a dj < 0.05). logFC: change in gene expression (log based) 



Gene 


Gene ID 


logFC 


p- value 


adjusted p- value 


HMGA2 


8091 


3.708 


0.0000000001 


0.0000016619 


IGDCC3 


9543 


3.648 


0.0000000025 


0.0000409144 


ACVR2B 


93 


1.211 


0.0000000083 


0.0001365400 


MMP16 


4325 


2.333 


0.0000002521 


0.0041232946 


C17orf63 


55731 


0.782 


0.0000002798 


0.0045772958 


ADAMTS7 


11173 


1.993 


0.0000007622 


0.0124675442 


IGF2BP2 


10644 


3.294 


0.0000015289 


0.0250102395 


FAM171B 


165215 


1.801 


0.0000021387 


0.0349852741 


MGAT5B 


146664 


2.875 


0.0000023541 


0.0385090592 
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SI. 6 Gene set enrichment analysis (GSEA) 

Having observed up-regulated genes in DICER1 hotspot cases compared to 
wildtypes, we wanted to see whether these genes were targets of particular 
miRNAs or members of canonical pathways. To answer this question, we 
utilized a gene set enrichment analysis (GSEA) using the UCEC data set. 

To create gene sets for targets of the well-conserved miRNA families, 
we first downloaded predicted miRNA targets from TargetScan (Release 
6.2) and then aggregated these predictions using miRNA family-member 
associations to obtain a list of targets for each miRNA family [23 j . We next 
filtered out predictions with conservation score lower than 90% and then 
collected targets that were in the upper 5 percentile considering their context 
score (i.e. scores lower than —0.3555). Using these filtered predictions, we 
created gene sets that were compatible with the conventional GSEA analysis 

We combined these miRNA target gene sets with gene sets represent- 
ing well-known and curated Reactome pathways from MSigDB \2A\ |2"5] . 
This gave us a total of 719 gene sets, consisting of 674 gene sets for path- 
ways and 45 for targets of miRNA families (File: GSEA-GeneSymbols- 
mirFamilies_and_Pathways.gmt). For the GSEA, we utilized the vomer 
utility from the limma toolkit and used the contrast model that we used in 
the RNA-Seq data analysis We set the number of rotations to 10,000 
and for each gene set, tested whether the genes in the set were enriched for 
any direction (up- or down-regulation). 

We found genes in 7 different sets to be significantly enriched towards 
up-regulation and none in the reverse direction (FDR < 0.15; Table [TJ File: 
GSEA-UCEC-muts_vs_wts.tsv). 5 out of 7 gene sets were representing 
target genes for miRNA families and 3 of these were miRNA families for 
which 5p strand was the predominant strand according to miRBase |27j . 
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